Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

VisHue: Web Page Segmentation for an Improved Query Interface for MedlinePlus Medical Encyclopedia

Identifieur interne : 000404 ( Main/Exploration ); précédent : 000403; suivant : 000405

VisHue: Web Page Segmentation for an Improved Query Interface for MedlinePlus Medical Encyclopedia

Auteurs : Aastha Madaan [Japon] ; Wanming Chu [Japon] ; Subhash Bhalla [Japon]

Source :

RBID : ISTEX:7EF68FF10F53FA37822A74035B17C3BB237AE5C9

Abstract

Abstract: World Wide Web has become the largest source of information. Consequently web based information retrieval, information extraction; automatic page adaptation and querying deep-web are gaining importance. The need for information retrieval applications is increasing. To address the problems of the ever expanding information over the internet, traditional information retrieval techniques have been applied. Such techniques are sometimes time consuming, and laborious, and the results obtained may be unsatisfactory. This study is an attempt to query web pages like MedlinePlus medical encyclopedia by segmenting the web pages. It summarizes the existing approaches for web page segmentation from the perspective of “structure realization for improved querying” on the web. It proposes a new algorithm VisHue for web page segmentation based on visual cues and heuristics and further uses the hierarchical structure generated by it to develop the Query by Segment or Tag (QBT) query interface. This interface is close to the end-user and exploits the relationships among the various content groups within a web page. Such an improved query-interface enables the user to perform in-depth querying. It is a step beyond the page-level search.

Url:
DOI: 10.1007/978-3-642-25731-5_9


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">VisHue: Web Page Segmentation for an Improved Query Interface for MedlinePlus Medical Encyclopedia</title>
<author>
<name sortKey="Madaan, Aastha" sort="Madaan, Aastha" uniqKey="Madaan A" first="Aastha" last="Madaan">Aastha Madaan</name>
</author>
<author>
<name sortKey="Chu, Wanming" sort="Chu, Wanming" uniqKey="Chu W" first="Wanming" last="Chu">Wanming Chu</name>
</author>
<author>
<name sortKey="Bhalla, Subhash" sort="Bhalla, Subhash" uniqKey="Bhalla S" first="Subhash" last="Bhalla">Subhash Bhalla</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:7EF68FF10F53FA37822A74035B17C3BB237AE5C9</idno>
<date when="2011" year="2011">2011</date>
<idno type="doi">10.1007/978-3-642-25731-5_9</idno>
<idno type="url">https://api.istex.fr/document/7EF68FF10F53FA37822A74035B17C3BB237AE5C9/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000977</idno>
<idno type="wicri:Area/Istex/Curation">000966</idno>
<idno type="wicri:Area/Istex/Checkpoint">000060</idno>
<idno type="wicri:doubleKey">0302-9743:2011:Madaan A:vishue:web:page</idno>
<idno type="wicri:Area/Main/Merge">000409</idno>
<idno type="wicri:Area/Main/Curation">000404</idno>
<idno type="wicri:Area/Main/Exploration">000404</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">VisHue: Web Page Segmentation for an Improved Query Interface for MedlinePlus Medical Encyclopedia</title>
<author>
<name sortKey="Madaan, Aastha" sort="Madaan, Aastha" uniqKey="Madaan A" first="Aastha" last="Madaan">Aastha Madaan</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Japon</country>
<wicri:regionArea>University of Aizu, 965-8580, Aizu-Wakamatsu Shi, Fukushima-ken</wicri:regionArea>
<wicri:noRegion>Fukushima-ken</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Japon</country>
</affiliation>
</author>
<author>
<name sortKey="Chu, Wanming" sort="Chu, Wanming" uniqKey="Chu W" first="Wanming" last="Chu">Wanming Chu</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Japon</country>
<wicri:regionArea>University of Aizu, 965-8580, Aizu-Wakamatsu Shi, Fukushima-ken</wicri:regionArea>
<wicri:noRegion>Fukushima-ken</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Japon</country>
</affiliation>
</author>
<author>
<name sortKey="Bhalla, Subhash" sort="Bhalla, Subhash" uniqKey="Bhalla S" first="Subhash" last="Bhalla">Subhash Bhalla</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Japon</country>
<wicri:regionArea>University of Aizu, 965-8580, Aizu-Wakamatsu Shi, Fukushima-ken</wicri:regionArea>
<wicri:noRegion>Fukushima-ken</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Japon</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="s">Lecture Notes in Computer Science</title>
<imprint>
<date>2011</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">7EF68FF10F53FA37822A74035B17C3BB237AE5C9</idno>
<idno type="DOI">10.1007/978-3-642-25731-5_9</idno>
<idno type="ChapterID">9</idno>
<idno type="ChapterID">Chap9</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Abstract: World Wide Web has become the largest source of information. Consequently web based information retrieval, information extraction; automatic page adaptation and querying deep-web are gaining importance. The need for information retrieval applications is increasing. To address the problems of the ever expanding information over the internet, traditional information retrieval techniques have been applied. Such techniques are sometimes time consuming, and laborious, and the results obtained may be unsatisfactory. This study is an attempt to query web pages like MedlinePlus medical encyclopedia by segmenting the web pages. It summarizes the existing approaches for web page segmentation from the perspective of “structure realization for improved querying” on the web. It proposes a new algorithm VisHue for web page segmentation based on visual cues and heuristics and further uses the hierarchical structure generated by it to develop the Query by Segment or Tag (QBT) query interface. This interface is close to the end-user and exploits the relationships among the various content groups within a web page. Such an improved query-interface enables the user to perform in-depth querying. It is a step beyond the page-level search.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Japon</li>
</country>
</list>
<tree>
<country name="Japon">
<noRegion>
<name sortKey="Madaan, Aastha" sort="Madaan, Aastha" uniqKey="Madaan A" first="Aastha" last="Madaan">Aastha Madaan</name>
</noRegion>
<name sortKey="Bhalla, Subhash" sort="Bhalla, Subhash" uniqKey="Bhalla S" first="Subhash" last="Bhalla">Subhash Bhalla</name>
<name sortKey="Bhalla, Subhash" sort="Bhalla, Subhash" uniqKey="Bhalla S" first="Subhash" last="Bhalla">Subhash Bhalla</name>
<name sortKey="Chu, Wanming" sort="Chu, Wanming" uniqKey="Chu W" first="Wanming" last="Chu">Wanming Chu</name>
<name sortKey="Chu, Wanming" sort="Chu, Wanming" uniqKey="Chu W" first="Wanming" last="Chu">Wanming Chu</name>
<name sortKey="Madaan, Aastha" sort="Madaan, Aastha" uniqKey="Madaan A" first="Aastha" last="Madaan">Aastha Madaan</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000404 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000404 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:7EF68FF10F53FA37822A74035B17C3BB237AE5C9
   |texte=   VisHue: Web Page Segmentation for an Improved Query Interface for MedlinePlus Medical Encyclopedia
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024